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This  import  deivritKJ  thj  efforts  undertaken  to  improve  the  expert- 
mental  model  ’’Voice  Sound  RecogjiiRer1*  originally  built  under  C -ntruC.  AF 
3060^-67-0-0300.  This  equipment.  utilised  the  techniques  of  d.r.gle  Equivalent 
Formant  parameter  extract!'  n,  phonemic  category  recognition,  and  category- 
sequence  word  reongnlt  ion. 

Extensive  hardware  and  software  modifications  to  the  basic  recognizer 
system  were  made  during  the  program  which  include  the  use  of  semiautomatic 
speaker  adaptation  by  means  of  distance  function  defined  by  sets  of  phonemic 
category  strings  and  nearest  neighbor  word  recognition  decisions. 

Tiie  final  recognizer  configuration  displayed  a reduced  speaker  sensi- 
tivity and  an  average  recognition  rate  for  four  speakers  of  P*5j  when  using  a 
25-word  vocabulary. 
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The  objective  of  this  program  was  to  make  necessary  modifications 
and  improvements  to  on  automatic  isolated  word  recognizer  fabricated 
under  a previous  contract.  Words  are  identified  by  recognizing  basic 
sound  classes  and  their  order  or  «equence  of  recognition.  This  pro- 
gram was  aimed  at  making  the  necessary  hardware  and  software  modifica- 
tions so  that  higher  recognition  rr  '.es  could  be  achieved  in  recognizing 
soete  of  the  more  difficult  sound  categories.  Previously,  the  nasal 
category  representing  the  sounds  "ra",  "n",  and  ,,ngM  was  recognized 
with  a very  low  probability  of  correct  recognition.  During  this  effort 
the  reliability  of  the  nasal  category  detector  was  greatly  improved. 

In  addition,  the  vowel  and  stop  detectors  were  significantly  modified 
to  improve  their  recognition  rates. 


Software  programs  were  written  so  that  the  sound  recognizer  hard- 
ware could  be  evaluated  more  efficiently  using  different  speakers  and 
varying  the  quantity  of  library  words. 


The  existing  system  is  new  capable  ot  recognizing  four  dixfcicnt 
speakers  uttering  25  test  words  with  word  recognition  ratesvarying 
from  721  to  921. 


In  its  present  form,  it  is  a usable  system  as  an  interface  between 
man  and  machine  using  a preselected  vocabulary  of  approximately  25-30 
words. 


Word  recognition  rates  could  still  be  improved  if  a feedback  system 
was  incorporated.  The  feedback  system  would  utilize  various  linguistic 
rules,  apriori  knowledge  of  the  library  words,  and  certain  c errand 
language  restraints.  Such  a system  could  very  well  be  implemented  and 
activate  various  systems  by  means  of  voice  control. 
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EVALUATION 


Word  recognition  is  performed  by  using  an  algorithm  which  uses 
distance  functions  generated  by  groups  of  sound  category  sequences  and 
nearest  neighbor  word  recognition  decisions. 
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The  building  of  a real-time  isolated  word  speech  recognizer,  utilizing 
the  Single  Equivalent  Formant  (3EF),  phonemic  category  recognition  and  cate- 
gory-sequence word  recognition  techniques,  was  accomplished  during  a previous 
contract  F30602-67-C-0310.  This  recognizer  was  entirely  hardware  implemented 
and  capable  of  recognizing  a total  of  any  12  words  chosen  from  a larger  112- 
word  vocabulary  of  Fortran  programming  words.  The  recognizer  design  concepts 
were  such  that  the  vocabulary  could  be  expanded  to  the  full  112  words;  however 
the  equipment  was  limited  to  12  words  for  reasons  of  implementation  cost.  The 
changing  of  vocabulary  words  was  accomplished  by  altering  hardwired  category- 
sequence  word-recognition  lo^ic  to  conform  to  design  data  obtained  from  sample 
utterances  of  the  desired  vocabulary  words.  The  design  of  this  recognizer  and 
the  underlying  concepts  are  fully  described  ir.  the  final  reports  for  contracts 
F 3C602-67 -C -0300  and  AF  30(502)^170  and  rhoul  d be  considered  a p^^t  of  thi  s 
report. 

Test  results  from  the  phonemic  category  word  recognizer  demonstrated  a 
tendency  towards  speaker  sensitivity.  As  shown  in  Table  1,  the  equipment 
achieved  a recognition  rate  of  93*  (for  a 12-word  vocabulary)  during  the  test- 
ing of  the  speaker  who  supplied  word  logic  design  data.  However,  for  new 
speakers,  the  recognition  rates  dropped  to  the  region.  To  overcome  this 
problem,  RADC  desired  that  three  efforts  be  undertaken  during  the  program 
described  in  this  report: 


Improvement  of  the  existing  phonei 
category  recognition  logic. 


Implementation  of  the  category-sequence 
word  recognition  logic  in  software. 


3.  Provide  a means  to  adapt  the  recognizer 
to  individual  speakers. 


The  first  of  these  efforts  was  directed  towards  finding  and  eliminating 
general  problems  in  the  phonemic  category  detectors.  The  latter  two  efforts 
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were  undertaken  to  reduce  the  speaker  sensitivity  of  the  word  logic.  The 
technique  of  adaptation  to  individual  speakers  was  felt  to  be  a reasonable 
system  compromise  in  applications  where  the  speaker's  speech  characteristics 
cun  be  determined  prior  to  use.  This  is,  of  course,  practical  only  if  the 
process  of  adaptation  is  acccmi lished  rapidly  and  automatically. 

Early  in  the  prop, ram,  a technique  for  word  logic  adaptation  was  con- 
ceived und  tested  by  RADC  project  personnel.  It  proved  feasible  for  use  in 
the  recognizer  und  was  subsi  piontiy  used  during  the  program.  The  technique 
involved  the  generation  of  a "training  library"  by  each  speaker,  which  con- 
sisted of  a number  of  samples  fo~  each  word  in  the  vocabulary.  These  train- 
ing samples  are  stored  in  the  computer  memory  in  the  form  of  phonemic  cate- 
gory sequences.  A separate  training  set  is  stored  for  exclusive  use  by  each 
speaker  whenever  tie  wishes  to  use  the  recognizer.  These  training  sets  are 
used  during  rec.  gnition  as  examples  against  which  the  unknown  test  word  must 
be  matched  by  the  category-sequence  word  logic.  The  variations  within  an 
individual  speaker's  training  library  had  already  shown  by  the  jrevicus  pro- 
gram to  be  within  ttic  match  capability  of  the  category-sequence  word  recogni- 
tion logic.  Thus , speakcr-tc-spcakcr  variation  was  effectively  cni.pi.iiwi 
Furthermore,  training  can  be  accomplished  automatically  and  as  rapidly  as  the 
speaker  can  utter  a set  of  library  words. 

Table  p shows  the  typical  improvement  obtained  by  this  technique  at  an 
early  point  in  its  development.  The  tost  results  are  for  three  speakers 
uttering  10,  00,  und  JO  word  vocabularies,  and  show  average  recognition  rates 
of  91,  8b , and  Ho  percent,  respectively,  and  relatively  small  differences 
from  speaker  to  speaker. 

The  training  library  technique  provides  ;ui  additional  important  advan- 
tage in  that,  a vocabulary  may  also  be  changed  us  rapidly  us  the  speaker  can 
utter  a new  set  of  training  library  samples. 

Figure  1 shows  the  block  diagram  cf  the  hardwa  re/ software  speech 
recognizer  that,  evolved  during  the  course  of  the  program.  It  consists  of  the 
phonemic  cutcgury  recognition  logic,  a computer  interface,  the  RADC  computer, 
tape  deck,  and  teletype  output.  The  operational  sequence  consists  of  a 
speaker  uttering  a set  of  training  words  into  the  phonemic  category  recognizer. 
The  detected  category  sequence  is  fed  to  the  computer  through  the  interface 
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Table  p 

Recognition  Rates  for  Individual  Speakers 
lining  a Training  Library  for  Logic  Adaptation 


10 -Word 

SO -Word 

30-Word 

Vocabulavy 

Vocabulary 

Vocabulary 

Speaker  4l 

95? 

90? 

93J 

Speaker  tf2 

95? 

90? 

BO? 

Speaker  ^3 

85? 

80? 

77? 

Average 

91? 

B-Xi 

Bo? 

and  stored  in  memory.  If  the  speaker  wishes  to  use  the  recognizer  at  a later 
date,  his  training  library  is  transferred  to  tape  so  that  he  will  not  nave  to 
generate  a training  library  a second  time.  Once  the  training  library  has  been 
generated,  the  recognizer  operates  in  the  recognition  mode,  e.t  which  time  the 
software  category  sequence  logic  carries  out  r.  matching  process  on  each  word 
as  it  is  uttered.  The  teletype  is  used  to  print  out  the  final  recognition 
decision.  The  entire  operation  is  essentially  a real-time  process. 

The  phonemic  category  logic  recognizes  a total  of  13  phonemic  cate- 
gories. The  categories  are  shown  in  Table  3 and  are  generally  divided  into 
groups  denoting  manner  of  articulation,  i.e.,  voiced  stops,  unvoiced  stops, 
nasals,  etc.  The  vocabulary  utilized  during  the  program  consisted  of  words 
taken  from  the  list  of  Fortran  programming  words  shown  in  Table  b. 

Section  II  of  this  report  describes  hardware  modification  while  Section 
III  describes  the  word  logic  concepts  and  software  developed  during  the  pro- 
gram. The  remaining  Sections  IV  and  V describe  the  results,  conclusions,  and 
recommendations . 


1 

s 

2 

5 

3 

f,  e,  h 

4 

P,  t,  k 

5 

'*>»  b 

6 

m,  n,  r 

7 

i,  J 

8 

1 1 e>  ® 

9 

A,  a,  ? 

10 

i,  D , U 

11 

u,  w 

13 

V - vowel 

14 

F - fricative 

Table  3 
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Table  4 


Fortran  Vocabulary 


function 

EQUIVALENCE 

COMMON 

DATA 

INTEGER 

REAL 

DOUBLE 

PRECISION 

complex 

LOGICAL 

RELEASE 

REPORT 

SYSOPS 

ABNORMAL 

MAGTAPE 

NOCARDS 

1.  CPRO^AAM 

MAP 

EDIT 

N0DE3UG 

BATCH 

FORTRAN 

LIB 

TAPE 

JOB 

PROGRAM 

F.NDCCMP 

CARDS 

NOIDENTITY 

CMPERR3 

F4DUMP 

HALT 

COBOL 

CONIN 

RIM 

TAC 

LOW 


Oil 

ONE 

TWO 

THREE 

FCt'R 

FIVE 

SIX 

SEVEN 

EIGHT 

NINE 

ASTERISK 

SLASH 

PUIS 

MINUS 

PERIOD 

CCMMA 

SEMICOLON 

SPACE 

EQUAL 

OPEN 

CLOSED 

BRACKET 


TRUE 

FALSE 

IDENTITY 

NOT 

LESS 

THAN 

GREATER 


AND 

GO 

ASSIGN 

IF 

RETURN 

END 

DO 

CONTINUE 

PAUSE 

STOP 

COMPLETE 

DIMENSION 

SUBROUTINE 

EXTERNAL 

BLOCK 

READ 

PRINT 

RINCH 

WRITE 

KNDFILE 

BACKSPACE 

REWIND 

FORMAT 

CALL 


K 

I 


I 


h 


t i 


m ' 


1 1 


I ! 
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SECTION  II 

PHONEMIC  CATEGORY  DETECTOR  MODIFICATIONS 


2.0 


Introduction 


Figure  2 shows  the  overall  logic  for  the  phonemic  category  recognizer 
(developed  during  the  previous  program).  There  are  four  basic  levels  of  logic. 
The  first  of  these  is  the  parameter  extractor  logic.  A total  of  three  para- 
meters are  extracted:  the  SEE,  the  SEF  amplitude,  and  voicing. 

The  second  level  of  logic  is  the  detection  of  acoustic  features.  This 
level  concerns  itself  with  the  quantization  of  parameter  levels  and  slopes. 
These  features  arc  the  simplest  individual  unit  of  information  processed  by 
the  recognizer. 

The  third  level  of  logic  is  termed  acoustic  event  detection.  Strings 
of  features  are  combined  to  form  units  that  are  perceptually  significant  but 
do  not  necessarily  possess  the  some  perceptual  value. 

The  fourth  logic  level  is  the  detection  of  phonemic  categories.  In 
this  logic  the  various  acoustic  events  are  subdivided  and  then  recombined  to 
form  categories  of  identical  perceptual  events.  Figure  3 shows  in  greater 
detail  the  individual  functional  blocks  and  their  interconnections. 


2.1  Error  Analysis 


At  the  onset  of  the  program,  it  was  observed  that  the  unvoiced  stop  de- 
tector (category  I*)  did  not  respond  for  certain,  speakers.  An  investigation  of 
the  nroblem  uncovered  a special  case  logical  error  in  the  release  detector 
(card  10)  which  is  described  in  Section  2.3*  This  problem  showed  the  need  for 
a general  performance  analysis  of  the  phonemic  category  detectors. 

This  analysis  consisted  of  examining  the  detected  phonemic  category  se- 
quences, as  printed  out  by  the  computer,  for  four  speakers  uttering  one  sample 
each  of  20  different  words.  The  detected  sequences  were  then  compared  with 
the  sequence  which  would  be  expected  by  considering  the  word  pronunciation. 
Confusion  matrices  were  constructed  from  which  the  summary  shown  in  Table  5 
was  derived.  As  shown,  both  the  probability  of  detecting  an  uttered  phoneme 
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and  the  probability  that  a detector  resp  nse  was  the  result  , f an  uttered 
phoneme  are  shown. 

Thr< ajor  problem  areas  are  evident  l'rcn  tt'.is  data.  First  and  !■..  st 

imp.  rtant,  the  nasal  category  (np)  has  a very  low  probability  r f detection; 
however,  i t is  detected,  there  is  a high  probability  t.tmt  the  i honome  was 
uttered.  Secondly,  wools  have  .just  the  reverse  character  1st  ic,  i rodue  r.p. 
many  "extra"  vowel  detections.  Finally,  the  voiced  st.p  detect,  r (category 
>1  also  tends  to  produce  many  extra  voiced  stop  detections.  The  previously 
r.entloned  unvoiced  stop  detect. r is  not  evident  because  u fix  was  ir.ude  wr 
this  prct-lc".  be  lore  the  irror  analysis  was  undertax,.-.. 

These  tests  prompted  further  circuit  ct'iil".s  to  eliminate  ttiese  ; r.  b- 
lems.  fhe  follow  ing  sections  describe  these  enorts. 


The  one  .nal  nasal  \ntcp.ory  detect,  r in  the  equipment  delivered  ,n  con- 
tract F.'.t  . -i  V-C-'j\)  wo  rked  by  detecting  excursions  o l"  the  3EF  signal 
irt  the  fe,  i.  nov  reeton  iio.sociat  ert  witn  the  art  iculat  ion  of  nasals.  This 
SEF  rep.  u n is  leiected  by  quantiser  circuit  l in  cards  rj,  6,  and  V.  The  final 
out;  :t  l'r.  n card  '!  is  dec.  i pr..i-  ed  ^ } and  requires  t ho'  presence  of  the  S!iK  sig- 
nal in  the  nasal  region  for  rose  Mian  1.  ' «s.  This  rather  long  duration  re- 
qu  .recent  v.is  imp.  sed  to'  inhibit  speri  us  nasal  detections  l\r  certain  short 
duration  n.  iselike  excursions  of  the  SIT  signal  into  the  nasal  region. 

Nasal  detection  failures  observed  during  the  performance  analysis  were 
found  to  occur  when  the  articulated  nasal  was  less  than  l.?0  ms  in  duration. 
This  situation  frequently  occurs  for  nld-ros  it  ion  nasals  such  as  t lie  second 
nasal  in  the  word  "minus". 

A new  card  was  made  and  installed  in  the  phonemic  category  recognizer 
to  help  red  ice  this  problem.  (The  card  is  located  in  position  lr>A  in  the 
recognizer. ) This  card  attempts  t.  differentiate  sho>rt  mid-p.  sit  ion  nasal 
characteristics  from  the  previously  mentioned  sturieus  SEK  excursions. 

Figure  U shows  the  functional  block  diagram  for  this  card. 

The  logic  utilizes  the  mid-nasal’s  characteristic  small  drop  in  the 
amplitude  and  the  excursion  of  the  SEF  into  the  nasal  region  during 
tt  is  drop,  l'lie  amplitude  rise  at  the  end  o'f  nasal  articulation  is  detected 


by  the  HC  differentiator  and  threshold  detector  at  the  bottom  of  the  figure. 
Quantised  S3'  information  from,  the  Q,^  hold  card  (#6)  rather  than  the  delay 
qard  (#■(")  eliminates  the  10  ms  dur  at ior.  requirement  imposed  by  the  original 
equipment.  The  S3'  information  is  passed  through  a ;0  mo  hold  circuit  to  in- 
sure its  presence  during  the  amplitude  rise  ut  the  end  of  the  nasal.  A;  a 
further  restriction  to  inhibit  spurious  S3  responses,  it  is  required  that,  the 
nasal  be  preceded  by  a voiced  sound  of  at  least  (0  ms  duration.  This  is  no  - 
complished  by  the  60  r.s  vc  icing  delay  circuit.  If  both  preceding  voicing  and 
occur,  the  Flip-Flop  is  set  by  means  of  the  left-hand  "and"  gate.  This, 
in  turn,  enables  the  right-hand  "and"  gate  to  pass  the  detected  amplitude  rise 
which  in  turn  triggers  the  output  one  shot . Flip-FIcp  reset  is  accomplished 
through  the  three  input  "or"  gate.  The  output  of  this  card  is  then  "or"  gated 
with  the  output  of  the  original  nasal  card  (# 1"). 

This  card  provides  an  improvement  when  the  SEF  nasal  signal 
is  detected  cr  the  drop  in  amplitude  is  pronounced  enough  to  be  detected. 

These  two  conditions  were  found  to  be  l'requen  ly  missing.  The  worst  case 
occurs  when  the  word  is  pronounced  so  rapidly  that  the  nasal  SEP  level  is 
totally  ivssing.  i.'ot’.iing  shoe,,  of  the  ruvw. .n<.w«*lioa  i;.  Sect..;;  V is  felt  to 
be  a satisfactory  solution  for  this  latter  case. 

It  was  found,  however,  that  for  the  case  of  undetectable  amplitude 
Changes,  a modification  could  be  ma  ic  that  would  provide  some  indication  cf  a 
nasal's  presence,  iihis  change  involved  changes  in  tu  th  t h.e  category  and  the 
word  logic.  (The  category-sequence  word  logic  changes  are  described  in  Sec- 
tion III.)  The  hierarchy  of  the  category  detectors  assxm.es  that  a w.rd  is 
composed  of  strings  of  vuwels  and  fricatives  interspersed  with  nasals  an 
st , ps . Th.e  beginning  of  a fricative  ,r  vowel  is  always  signaled  by  the  pho- 
nemic recognizer  wit!)  a general  vowel  or  fricative  category  (13  and  l1* , re- 
spectively) and  then  followed  by  t l.e  specific  vowel  or  fricative  category 
(7,  f?,  9,  10,  11  or  1,  3).  Tlvuo,  a typical  vowel  fricative  sequence  would 

be  13-3- a — lh -1 . If  a nasal  occurs  between  two  vowels,  the  general  vowel  cate- 
gory #13  dctectic  •>  is  generated  before  both  vowels,  i.c,,  13-7-6-13*3.  This 
is  accomplished  in  the  vowel  fricative  detector  (card  #11)  by  resetting  the 
vowel  and  fricative  duration  detectors  whenever  r nasal  is  detected  by  card 
15  and  15A.  However,  if  the  S3'  nasal  quantizer  is  not  present  for  more  than 
lTO  ms  prohibit  ing  on  o :tput  on  card  15  and  the  amplitude  drop  is  too  snail  to 
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satisfy  the  detector  card  15A,  no  resetting  of  the  vowel  fricative  detector 
will  occur. 

Tl’.e  above  category  vouel-r.au  al -vowel  sequence  is  then  detected  as  13-7- 
8.  Tb is  situation  is  particularly  hard  for  tile  word  logic  to  handle  and  leads 
to  very  poor  natch  scores.  To  reduce  the  penalty  imposed  upon  the  word  logic 
in  such  cases,  a modification  has  been  made  to  provide  some  indication  of  the 
occurrence  of  a nasal  when  the  nasal  3EF  detector  is  satisfied  but  the  ampli- 
tude criteria  is  not.  The  vowel -fricative  detector  (Figure  b)  was  modified  so 
that  reset  occurs  by  th"  occurrence  of  an  output  from  the  unfiltered  SEF  r.asiil 
region  quantiser  (Q()  rather  than  the  actual  output  from  the  nasal  detector. 
Thus , if  a short  nasal  is  not  detected  because  the  amplitude  change  is  too 

small  even  though  there  is  an  outpvat  in  t he  quantiser,  then  the  vowel  cate- 
gory (13)  will  be  inserted  in  the  ' itegory  sequence,  i.e  , 13-7-13-8.  Py  pro- 
per word  logic  design,  the  13  in  the  middle  of  the  vowel  string  ray  be  inter- 

preted to  mean  the  presence  of  a nasal  in  the  word  sequence. 


T .3  Stop  Detector  Mod  if  icat  ion. 

During  tiie  initial  testing  of  the  phonemic  category  word  recognizer 
after  delivery  to  RAPC,  it  was  noticed  that  the  unvoiced  stop  category  detec- 
tor did  not  respond  for  certain  people's  articulation  of  "p"  and  ”k".  Inves- 
tigation shoved  that  a transient  in  the  voicing  detector  microphone  occurred 
ft r these  people.  This  transient  (reduced  a 10  to  30  ms  duration  voiced  in- 
dication at  the  onset  of  the  plosive  release  of  the  unv. iced  energy  during  the 
articulation  of  "p"  and  ”k".  This  v.iced  indication  was  sufficiently  1»  ng  to 
reset  the  Flip-Flop  for  the  class  5 release  detector  through  the  three  input 
"or"  gates  shown  in  Figure  6.  The  resetting  of  this  Flip-Flop  inhibited  the 
detection  of  the  rele  .se  and  thus  the  unvoiced  stop.  This  situation  was  cured 
by  changing  t ho  voicing  input  to  the  delayed  voicing  signal  which  requires 
that  vo i c i 1 ig  be  (resent  a minimum  of  *0  ms  before  an  indication  of  voicing  is 
produced.  Thus,  the  short  bursts  of  voiced  indication  were  inhibited  from 
effecting  the  detection  of  unvoiced  stop  categories. 
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The  large  number  of  extr  a vowel  reo.  (tuitions  i nd i oat ed  by  the  phonemic 
category  error  analysis  were  felt  to  be  caused  by  the  circuit  technques  uti- 
lize! to  quantize  and  filter  the  SKF  signal.  The  filtering  Is  required  to 
eliminate  a small  amplitude  noise  component  (ltX)  to  300  Hz)  which  is  found  on 
the  has ic  . 1 Hz  SKF  signal.  This  noise  cannot  be  removed  by  simple  low  pass 
filtering  becuuse  there  are  also  significant  large  amplitude  step-1  .he  changes 
in  the  f.KT  signal  (lo.)  to  3')  Hz)  which  must  be  ('reserved.  Tile  existing  pho- 
nemic category  recognizer  attempted  to  cope  with  tilts  problem  by  quantizing 
the  SKF  signal  into  a number  of  frequency  ret; ions  and  then  I iltering  each 
quantizer  ostjut  watli  ttie  liold  and  delay  circuits  on  cards  0 and  f.  There 
circuits  first,  filled  in  , 0 ms  or  less  gaps  in  the  quantizer  outputs  and 
secondly  required  ut  least  100  ms  of  signal  to  be  | resent  before  a vowel  re- 
cognition was  indicated.  Tills  tyje  of  processing  lias  the  disadvuntup.e  tiiut 
two  vowels  can  be  detected  simultaneously  if  the  noise  and  SET  sip.nul  are  In 
such  a position  that  two  quantizers  are  activated  ut  the  sane  time.  This 
action  tends  to  produce  a switching  back  and  forth  between  two  vowels  which 
results  in  extiu  .o»el  muu  a!  * .*•>  tile  \v«cl  category  detectors. 

An  attempt  was  made  during  t lie  current  program  to  solve  this  problem  t 
filtering  the  SKF  signal  be  fora  quantization.  Tills  prequant  leer  filter  was  a 
;>0  llz  active  low  pass  filter  which  could  be  switched  In  and  out  of  the  circuit 
defending  . n the  presence  or  absence  of  the  large  am|lltude  step  changes  in  the 
SKF  signal.  Ttius,  small  amplitude  noise  e<  uponcnt s could  be  filtered  out  with- 
out losing  important  step  information. 

Figure  7 shows  t tie  functional  block  diagram  of  the  circuit  developed  to 
perform  tins  switchable  filtering  action.  The  SKF  signal  la  fed  to  a .'1  Hz 
active  low  (ass  filter  t (trough  a FKT  switch  i,'^.  Wien  a SEF  step  change  occurs, 
switch  Q is  turned  on  and  off  so  that  the  SO*  signal  bypasses  the  filter 
and  is  applied  directly  to  the  output  terminal.  Switch  is  controlled  by  a 
comparator  an.i  window  threshold  detector.  The  SEF  input  signal  and  the  low 
pass  filtered  SEF  output  signal  arc  subtracted  by  the  Comparator.  The  output 
of  the  comparator  consists  of  the  noise  ami  step  function  component;!  o>f  the 
SKF.  These  are  fed  to  an  upper  and  lower  limit  threshold  detector  whose  thresh  - 
old  limits  are  set  by  *V  tutd  -V.  These  limits  are  set  high  enough  that  they 
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are  only  exceeded  by  the  large  amplitude  step  changes  in  the  3EF  signal.  When 
such  a change  occurs  {in  either  direction),  switch  is  turned  off  by  the 
window  threshold  detector  output.  This  is  done  to  protect  the  output  signal 
from  influences  of  the  step  change  until  it  is  determined  to  be  a step  change 
and  not  an  isolated  large  amplitude  noise  spike.  Two  conditions  must  be 
satisfied  to  verify  the  step  change  and  turn  on  Q^.  First,  or.e  full  sampling 
interval  (a  pitch  period)  of  no  change  must  occur  between  the  detection  of  a 
step  SEF  change  and  its  verification.  This  rejects  the  influence  of  one 
sampling  interval  SEF  impulses  that  go  up  and  down  during  adjacent  sampling 
intervals.  Secondly,  the  turning  on  of  Q0  can  occur  no  sooner  than  ?0  ms 
after  the  detection  of  the  SEF  change.  These  requirements  arc  accomplished  by 
means  of  a window  threshold  detector,  Flip-Flop , "and"  gate,  and  20  ms  delay 
as  shewn  in  Figure  7. 

Tile  operation  of  this  circuit  proved  quite  satisfactory  in  filtering 
the  SEF  signal  without  introducing  obloctional  distortions.  The  circuit  was 
not,  however,  installed  in  the  phonemic  category  recognize-  because  of  the 
numerous  other  changes  required  for  installation.  It  is,  however,  recommended 
in  Section  V that,  such  a change  be  ultimately  added  to  the  phonemic  category 
recognizer. 


2.5  Go  Light 

The  purpose  of  the  go  light  is  to  provide  (1)  the  speaker  with  a visual 
indication  that  the  computer  is  ready  to  accept  the  next  word  arid  (2^  to  dis- 
able the  hardware  while  „he  computer  is  processing  data,  i.e.,  the  teletype 
printing  out  results.  Figure  8 shows  the  block  diagram  for  this  circuit. 

When  the  computer  is  operating,  is  turned  inhibiting  the  operation  of  the 
category  recognizer  by  clamping  the  amplitude  detector  in  a no  sjieech  condi- 
tion. The  go  light  operation  is  initiated  by  the  computer  ready  gate.  A 
comparator  is  used  for  level  conversion  of  the  computer  signal.  The  compara- 
tor triggers  a .75  second  delay  to  allow  sufficient  time  for  all  noise  from 
the  teletype  to  die  out.  The  .75  second  delayed  ready  gate  turns  on  the  "go" 
light  by  means  of  the  lamp  driver  and  turns  off  transistor  switch  Switch 

enables  the  amplitude  gate  detector  to  operate  in  its  normal  manner.  As 
soon  as  a word  is  spoken,  the  go  light  is  turned  out  by  the  computer  ready 


gate  signal.  This  action  again  turns  on  disabling  the  phoneme  category  re 
ccgnizer  by  forcing  a no-speech  condition  or.  the  amplitude  gate  signal  line. 


Voicing  Detector  Modification; 


The  voicing  detector  utilized  in  the  phonemic  category  recognizer  oner 
ated  by  means  of  a microphone  in  contact  with  the  top  of  the  head  to  pick  \r> 
mechanical  vibrations  associated  with  vocal  cord  activity,  (bee  final  rejxrt 
for  Contract  AF  30(602)^170.)  This  technique,  while  extremely  accurate,  pre- 
cludes the  use  of  the  recognizer  for  prerecorded  speech  where  this  special 
microphone  was  not  available.  An  attempt  .u.  made  during  the  program  to  de- 
velop a voicing  circuit  that  would  operate  without  the  use  of  the  cranium 
microphone . 

The  functional  block  diagram  for  the  new  circuit  is  shewn  in  Figure  0. 
Three  parameters  were  utilized  to  form  the  output  decision: 


the  derivative  of  the  dual  time  constant 
peak-detected  amplitude  parameter. 


2.  the  zero  crossing  rate  of  the  clipped 
speech. 


3.  the  low  frequency  component  of  the  anpli 
tude  parameter. 


The  reok-detected  amplitude  waveform  has  a spectral  component,  the  frequency 
of  which  is  a function  of  the  state  of  voicing.  For  unvoiced  sounds,  the  peak 
detector  tends  to  charge  in  small  increments  at  an  average  rate  of  1000  to 
2000  Hz.  During  voiced  sounds  the  peak  detector  is  influenced  by  the  piten 
period  and  thus  tends  to  charge  in  fewer  but  much  larger  steps  with  an  average 
rate  of  100  to  300  Hz.  This  difference  in  peak  detector  charging  frequency  is 
detected  by  triggering  a 500  one  shot  MV  with  the  differentiated  (H^C^)  and 
amplified  amplitude  parameter.  Low  pass  filtering  of  the  one  shot  output  con- 
verts the  pulse  rep  rate  to  a baseband  parameter  whose  absolute  value  corre- 
lates with  the  state  of  voicing. 

This  signal  is  rcsistively  added  to  a second  signal  derived  from  the 
clipped  speech  signal.  The  average  zero  crossing  rate  is  extracted 
by  triggering  the  50  ^s  one  shot  MV  each  tine  the 


clipped  speech  signal  changes  state.  The  one  shot  is  used  to  reset  a ramp 
generator  Q , C,  and  R0  whose  average  energy  is  proportional  to  the  period  be- 
tween  zero  crossings.  low  pass  filtering  is  again  used  to  reduce  the  signal 
to  a baseband  parameter. 

The  sunned  output  of  these  two  signals  is  subtracted  from  an  adjustable 
DC  signal  in  a difference  amplifier.  This  DC  "oltage  serves  as  a threshold  ad 
justnent  for  the  circuit.  The  resultant  signal  is  fed  to  a comparator  along 
with  the  amplitude  parameter.  The  amplitude  parameter  also  correlates  with 
voicing  in  that  unvoiced  sounds  are  cf  lower  amplitude  than  voiced  sounds 
(when  considering  the  system  frequency  response  characteristics).  The  com- 
parator output  is  a bilevel  signal  indicating  the  voiced  and  unvoiced  states. 

This  circuit  was  tested  and  found  to  work  satisfactorily  in  good  signal- 
to-noise  environemnts  when  compared  to  the  cranium  microphone.  However,  be- 
cause ol'  performance  losses  wider  poor  signal -to-noise  conditions,  this  cir- 
cuit was  not  installed  in  the  equipment. 


The  phonemic  category  recognition  rates  indicated  by  the  error  analysis 
of  Section  II  are  obviously  something  less  than  perfect.  Such  category  recog- 
nit  ion  rates  most  certainly  would  i resent  serious  difficulties  in  a word  re- 
cognition logic  which  requires  all  pi.sneir.es  in  a word  to  be  correctly  detected. 
The  effects  of  such  a requirement  ure  shown  by  the  lower  dotted  line  in  Figure 
10,  Here,  the  sice  of  the  word,  in  number  of  phonemes,  versus  the  jrobability 
of  having  correctly  detected  all  of  the  phonemes  within  the  word  is  plotted. 
The  figure  assumes  that  the  average  irobahility  of  detecting  each  phoneme  is 
0.9*  As  shown,  a f ive-i honene  word  would  be  correctly  recognised  less  than 
6 0%  of  the  time,  while  an  eight - i hcncme  word  would  be  recognized  u little  bet- 
ter than  4 L*'.  of  the  time.  Such  results  are,  of  course,  very  discouraging  and 
seem ’contrary  to  the  fact  that  longer  words  should  be  wore  easily  recognized 
by  v.rtue  of  their  additional  redundancy.  In  order  to  improve  word  recogni- 
tion rates,  it  is  obvious  that  a word  logic  must  be  capable  of  exploiting  vo- 
cabulary redundancies. 

The  effect  of  using  such  redundancy  is  shewn  by  considering  the  recog- 
nition rate  of  the  example  just  described  when  any  one  phoneme  in  the  word  is 
allowed  to  be  missing  from  the  detected  phonemic  string.  The  middle  dotted 
line  in  Figure  Id  shows  these  results.  Kcw,  a five -phoneme  word  would  be  re- 
cognized better  than  9 >v  of  the  time,  while  an  eight -phoneme  word  is  recog- 
nized better  than  of  ttie  time.  A further  refinement  of  this  word  recogni- 
tion technique  would  be  to  carry  along  with  the  recognition  indication  the 
number  of  phonemes  found  to  bo  missing.  This  could  be  used  as  part  of  a mea- 
sure of  recognition  confidence  or  match  quality.  It  might  be  argued  that  if  a 
word  in  the  vocabulary  (word  A)  had  a correct  phonemic  spelling  identical  to 
the  word  with  a missing  phoneme  (word  B),  then  a nultlilc  recognition  would 
occur  whenever  the  word  "A"  were  spoken.  This,  naturally,  could  be  resolved 
by  noting  that  the  word  "B"  lias  a missing  phoneme  and  thus  a lower  match 
quality.  If  the  reverse  occurred,  i.e.,  word  "B”  were  spoken  and  the  phonemic 


ANY  NUMBER  OF  RON  ADJACENT 
PHONEMES  MAY  BE  MISSING 
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category  detectors  lulled  to  res.  g.ni’.e  a phones*  such  that  it  ('reduced  a se- 
quence identic*!  to  ti.w  spelling  of  ^ rd  "A",  then  no  hum  is  done  be.u..se  it 
would  have  b -e:i  murec.  gni.ed  by  either  systtiu. 

Ini.  next  logical  extension  of  this  concept  is  U allow  any  n>  n-adj .icciit 

ph.  nen.e  i«.  the  word  to  be  missing.  "he  effects  of  this  procedure  on  re.vgni- 
tlv’i  rate-  :.re  shown  in  the  top  line  of  Figure  l.>.  K.  w,  the  eight -phoneme  woid 
is  recognized  ->3i  of  the  tine,  l'ho  word  recognition  rate  has  now  exceeded  the 

pi.  nonic  category  rale  r ;il!  s\  rds  less  than  about  li>  phonemes  in  length. 

The  category  string,  word  recognition  logic  chosen  for  this  pr.graa  is 
based  upon  these  concept 3.  They  are  summarized  as  follows: 


1.  Word  tv.atehes  will  be  made  in  spite  of 
missing  .V  extra  ph  iiem.es  in  the  de- 
tected phonemic  category  string. 

The  number  and  type  of  missing  and  extra 
phonemes  will  be  preserved  and  carried 
as  a match  quality  for  filial  recognition 
dec  is  ion. 

3.  The  final  word  fee  gn it  ion  will  be  based 
upon  the  1 el  at  .ve  distribution  of  "-atchci, 
missing,  pnd  extra  phonemes. 


3 . Mbrary  Concept 


The  library  training  concept  pix  posed  tuid  inqlenented  by  KA1V  personnel 
provides  a set  of  examples  (several  for  each  library  word)  against  which  a 
test  word  is  matched.  Kach  speaker  using  t.he  reoogniser  must  provide  a set  of 
library  samples  before  the  word  recognition  mode  is  initiated.  This  technique 
reduces  the  speaker  sensitivity  of  the  recognizer. 

The  optimum  number  of  library  samples  required  for  each  vocabulary  word 
was  determined  to  be  three  by  experimentation  at  RADC.  More  than  three  library 
samples  per  vocabulary  word  do  not  appear  to  increase  the  recognition  rate  by 
any  appreciable  amount.  This  number  is,  of  course,  a function  of  phonemic 
category  logic  design  and  thus  any  major  changes  in  its  design  would  be  ex- 
pected to  change  these  results. 


The  matching  of  test  word  and  library  word  phonemic  category  sequences 
is  complicated  by  the  fact  that  there  are  usually  differences  between  the  two 
sequences.  This  is  obviously  the  case  when  two  different  words  are  being 
matched,  but  also  frequently  occurs  when  the  words  being  matched  are  the  same. 
Variability  in  both  the  human  articulation  and  equipment  operation  produces 
small  but  significant  differences  in  the  detected  phonemic  category  sequence 
for  two  utterances  of  the  same  word  even  when  spoken  by  the  same  person.  Thus, 
in  natening  the  phonemic  category  sequences,  it  has  been  found  desirable  to 
first  match  gross  features  in  the  phonemic  category  sequence  (vowel  and  frica- 
tive categories)  and  then  match  the  fine  detail  between  these  gross  phonemic 
features.  In  this  manner,  general  similarities  between  the  test  and  library 
words  may  be  found  without  interference  from  the  minor  differences  between  the 
words . 

The  category  sequence  matching  algorithm  achieves  this  objective  by 
utilizing  an  observed  general  phonemic  structure  of  words.  This  general  pho- 
nemic structure  can  be  characterized  by  a sequence  of  vowels  and  fricatives 
interspersed  with  stops  and  rwspls.  he.,  vowel -stons  and/or  nasals-iVicatives- 
stops  and/or  nasals,  etc.  In  terms  of  phonemic  categories  used  in  the  pho- 
nemic category  recognizer,  a word  may  be  desribed  by  the  general  sequence 
shown  below  by  assuming  that  the  sequence  may  be  started  and  ended  at  any 
point,  and  that  elements  may  be  deleted. 


Specific  Ji'asal 

Vowel  Vowel  and/or  Stop  Fricative 

(13)>“M7  through  11),— > (4  through  6), — > (14), — 


Kasai 

and/or  Stop 


Specific 

Vowel 


Kasai 

and/or  Stop 
(4  through  6 ) — ^ 


Vowel 


(4  through  6/ 


(7  through  11) 


Thus,  for  example,  the  word  'time',  which  has  a phonemic  category  sequence 
(U)  (13)  (9,7)  (6),  matches  the  General  sequence  beginning  at  point  A and 
end  ini;  at  [oint  b. 

Tile  category  sequence  matching  algorithm  uses  rules  derived  Iron  the 
structural  characteristics  ol’  t!iis  general  sequence.  These  rules  have  been 
devised  to  force  a natch  based  primarily  upon  gross  features  of  the  phonemic 
category  sequence.  These  rules  have  been  derived  from  the  following  con- 
siderat ions. 

During  the  matching  of  each  phoneme  category  in  the  library  and  test 
word,  the  phonemic  categories  will  cither  be  the  sane  or  different.  If  they 
are  the  same,  a mate!)  is  considered  made  and  the  next  pair  of  phonemic  cate- 
gories in  the  test  and  library  sequences  is  compared.  If  the  phonemic  cate- 
gories are  different,  then  it  must  be  assumed  that  an  extra  phoneme  has 
occurred  in  either  the  test  word  or  library  word.  The  total  number  of  extra 
phonemes  that  must  be  assumed  to  complete  the  word  match  is  dependent  upon 
the  choice  of  where  the  extra  phoneme  occurred — the  test  word  or  the  libra 
word. 

To  demonstrate  this  situation,  consider  the  matching  of  the  following 


equences 


library  word 


test  word 


The  13  Paid  7 of  both  sequences  would  be  matched.  To  continue  the  match, 
either  the  U in  the  library  word  or  the  lU  in  the  test  word  must  now  be 
assumed  to  be  extra.  If  the  U is  called  extra,  the  lb  and  1 would  be  matched 
leaving  a total  of  one  extra  phoneme  for  the  word  match.  If  the  1**  in  the 
test  word  is  called  extra,  then  only  the  l's  could  be  matched  leaving  a total 
of  3 extras  for  the  final  word  match  (lb,  lb,  b).  By  consulting  the  previous 
ly  described  general  sequence,  the  choice  of  assumed  extra  library  or  test 
phonemes  which  will  produce  a minimum  number  of  extra  phonemes  tan  be  pre- 
dicted. Examination  of  each  possible  combination  of  matches  in  the  general 
sequence  provides  this  information.  For  example,  in  the  general  sequence 
shown  below,  a test  word  (lb)  matched  in  the  position  shown 


-t 


11  i>rury  (i*)(/-ll)(U0(i:0(i--3)(^)(l3)(7-H)(l.^)(lli1 

tent  (13)(7-11)(1U) 


produces  a minimum  of  extn-  categories  by  assuming  that,  an  extra  (li  thr  ugh 
6)  has  occurred  In  the  library  word. 

The  rules  resulting  from  this  comparison  have  beer  tubulated  In  a 
6 x 6 matrix  for  use  In  the  category  sequence  word  maocting  alc.orlthm  and  is 
shown  in  Table  6.  For  each  combination  of  tost  werd  and  library  word  phonemic 
category,  an  algorl t lim  action  is  indicated.  A-  shown,  the  fine  detail  with'n 
t lie  general  sequences,  i.e,,  specific  fricative  (i..  ,3)  stop  and/or  nasal 
and  specific  vow-’l  (‘M’*0. 10.11),  Is  matched  in  a se|  a'  ate  routine 
called  string  matching. 


3 . •»  Pi  ring  Mat  ciilng 


Strings  are  any  comh! nation  ol  V,  0,  •>,  10,  and  11  or  It,  and  6 or 
1,  p,  and  3 in  an  unbroken  sequence.  The  intention  of  the  string  routine  is 
to  systematically  find  ttio  maximum  number  of  phonemic  category  maeones  (tiai 
can  be  made  solely  within  the  string.  This  process  is  accomplished  by  start- 
ing at  the  end  of  the  string  and  moving  toward  the  beginning  of  the  string 
(front  of  Uie  word),  making  all  possible  mat  dies  until  an  extra  phoneme  is 
encountered.  All  matches  to  this  point  arc  recorded.  The  matching  next,  pro- 
ceeds to  the  front  of  the  string  and  moves  toward  the  end  of  the  string. 

Again,  each  phoneme  in  t lie  test  word  string  is  compared  with  each  phoneme  in 
the  library  word  string.  Winn  matches  are  made,  the  phonemes  in  question  are 
made  unavailable  "or  matches  with  other  unmatched  phonemes.  After  all  matches 
are  made,  cross  linkages  between  mate  lied  pairs  are  noted.  If  any  occur,  a 
penalty  is  applied  to  the  match  in  the  form  of  one  negative  match.  Thus,  in 
the  following  example,  there  are  a total  of  four  mat  died  categories  and  two 
extra  categories* 


library  word  string 
ter.t.  word  string 


As  shown,  there  are  4 matches  but  the  crossover  of  linkages  reduces  the  catch 
cou.it  by  one.  A maximum  of  one  negative  match  is  given  even  when  more  than 
one  cross  linkage  occurs . 


Modtf  icat  ion  t .1  faring  Pontine  Wien  n Nasal  is  Present. 


Software  work  was  also  directed  toward  the  elimination  of  word  errors 
that  occur  when  a spoken  nasal  is  detected  in  either  the  library  word  or  test 
word  but  not  both.  Under  these  conditions,  the  word  logic  frequently  produces 
an  incorrect  word  decision  because  the  matching  of  gross  features  is  disrupted 
and  produces  very  poor  matching  scores.  The  effect  of  this  1 roblem  was  re- 
duced by  assuming  that  a nasal  could  be  missing  if  a nasal  occurred  in  either 
the  tost  or  library  word  and  not  the  other.  Thus,  as  a special  case,  an  ex- 
ception is  made  to  the  definition  of  vowel  strings  (combinations  of  7,  8,  9> 

10  or  11)  when  a 13,  6-13  or  6 -5 -13  occurs  within  a sequence  of  vowels.  For 
example,  the  category  sequences  9-10-6-13-7-8,  9-10-13-7-8  and  9-10-6-5-13-7-8 
are  considered  to  be  continuous  vowel  strings.  String  matching  under  these 
conditions  is  made  as  in  a string  match,  but  the  scoring  is  different.  Extra 
1 3 • r.  nvn  not  counted  but.  extra  b's  and  Vs  are  counted.  For  example. 


4 matche 


5 matches 
1 extra  nasal  (6) 


4 matches 

2 extra  nasals  (5,6) 


4 matches 
1 extra  nasal  (6) 


Match  Quality  Calculation 


A match  using  the  techniques  described  in  the  preceding  sections  is 
made  between  a test  word  and  every  word  in  the  training  library.  The  match 
quality  for  each  word  is  calculated  and  the  final  recognition  choice  is  mads 
by  selecting  the  library  word  with  the  best  match  score.  The  calculation  of 
the  match  quality  or  "Q"  is  performed  by  dividing  the  number  of  extra  cate- 
gories by  the  number  of  match  categories  for  each  libtary  word.  Two  variation: 
to  this  basic  routine  f’or  calculating  Q have  been  tr'eu.  The  first  method  is 
weighted  by  the  relatively  greater  importance  t-.f  the  consonant  categories 
(l,  2,  3>  **1  5 and  6)  and  the  general  fricative  and  vowel  ct ‘egorios  (13  and 
lb).  Additional  weight  is  applied  to  these  categories  by  multiplying  the 
appropriate  extra  category  count  by  two.  The  value  of  Q is  given  by  the 
following  equation. 


where  E.  = extra  category  i 


extra  category 


M = matched  category  1 


matched  category 


The  second  match  scoring  method  makes  use  of  the  data  obtained  fro: 
phonemic  category  error  analysis.  Weights  are  given  to  extra  and  matched 
categories  according  to  their  probability  of  detection  and  non-detection. 
The  equation  for  this  method  of  scoring  is  given  by: 


Table  7 shows  the  methods  of  calling  fcr  each  option  via  comjuter  hardware 
sense  switches  and  data  words.  Sense  switch  1 is  used  to  continue  or  stop  the 
program  as  desired.  Sense  switch  2 is  used  to  control  the  number  of  words 
printed  out  after  each  recognition  attempt.  Position  1 prints  out  the  three 
library  words  which  have  the  highest  Q values,  whereas  position  2 prints  out 
only  the  first  choice  word.  Sense  switch  3 is  used  to  change  the  values  of 
: DELTA"  and  "TJIRESH".  These  two  data  words  provide  a method  to  reject  recogni- 
tions which  have  excessively  poor  (la rge)  Q values  or  recognition  whose  Q 
values  are  too  close  to  each  other.  Position  1 of  the  sense  switch  3 is  used 
for  continued  use  of  THRESH  and  DELTA  values,  while  position  2 used  to 
change  these  values.  When  all  library  matches  exceed  either  of  these  two 
limits  during  the  matching  of  a test  word,  the  word  "Eli?"  is  printed  out. 

Data  word  "CONTROL"  is  used  to  change  several  program  options.  The 
first  of  these  is  called  "old"  and  "new-'.  The  old  option  uses  the  first 


4.  A tabulate  routine  that  lists  tiie  total  num- 
ber of  matches  and  delations  for  each  pho- 
nemic category  which  occurred  during  the 
testing  of  a series  cf  words. 


5.  A phonemic  category  combiner  routine  to  re- 
duce the  total  number  of  »<> , 


- 


The  category  sequence  word  recognition  software  written  lor  the  program  can 
provide  as  a option  either  of  these  two  forms  of  C*  calculation. 


1.  Two  different  methods  of  calculating  "Q". 


Five  basic  program  options  are  provided  for  in  the  software.  These 
options  are: 


3.  A library  evaluate  routine  in  which  each 
library  wo-d  is  tested  against  every  other 
word  in  the  library  to  determine  the  quality 
of  the  Hbrary  samples. 


3.7  • Software  Pollens 


2.  Adjustable  limits  of  Q values  for  acceptable 
recognition. 


r i c 


new  uses  tt ,n  cri  r analyst 


culatum  r.ct tied: 


arc  need  and  the  results  printed  ei 


are 


/i  t r. 


' 
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SECTION  IV 
RESULTS 

b.l  Trf  r diction 

Tin;  performance  level  of  the  ihjn  -nic  category  word  recognizer  was  con- 
siderably improved  dur.n,:  * !.o  prccran  with  respect  to  si  enker-to-si  eaker 
variability.  This  ,;ain  was  acc.  its  1 is'.ed,  however,  with  the  additional  ex  - 
pen.* of  a speaker  sample  library.  Such  a library  was  shown  to  be  easily 
generated  and  useful  ir.  adapt  inf.  the  recognizer  to’  tiie  speaker.  Typical 
speaker  performance  f-r  a . y-word  library,  utilising  the  final  form  recog- 
nizer* is  shewn  in  Tabic  >.  The  results  for  a single  speaker  are  analyzed 
in  detail  in  tr.e  iVli  winy,  subsections. 

U . . T : air.il..-,  I. . I r o ;.  o i 

The  Vocabulary  util. zed  f-..r  ti.is  test  c.nsisted  of  .5  F rtr.an  words. 
•r>-o  ■•■oil, til  ary  list,  was  utter'  d t.urce  tir.es,  proviu.:  t three  ..  ts.plcs  f each 
vocabulary  word.  The  "Evaluate"  technique  was  used  to  test  t:.e  ..a:y.  les,  .nil 
those  which  were  e nf\. s-d  wit!:  other  vocable  wards  were  rejected  •’  1 
sv.ples  were  then  inserted  in  their  place.  Three  itera'.  i ns  f t!.e  evaluate 
r tire  were  required  to  cvtiisize  the  library.  Tie  final  i'b-w  rd  cample 
library  in  vh  vn  in  Table  lJ.  The  library  ID  number,  w rd  cia  cate,:  ry  se- 
quence are  shewn  in  th's  tall  -. 

•• . i 1-  eec.  r. : * i :n  ■-  v ul‘  s 

Tl.e  reoc ,T.it . . r.  *ests  were  carried  out  by  .t'erin,;  I'  :r  sanities  f r 
each  of  the  . b vocabulary  w rds.  Table  11  shews  these  results  for  t.th 
r.ethods  of  calculating  u.  The  first  ,;rcun  cf  ti.ese  w„rds  shows  the  t p t!,ree 
choices  ar.d  ti.eir  Q values  for  tl.e  first  r.eth  ,d  of  calculation.  T!.e  r-'Co,:- 
nized  word  is  ccns.JoreJ  to  te  the  word  with  t!:e  lowest  Q value.  Tl."  second 
(;r~up  cf  w„rds  slows  the  results  l\  r the  error  anal y..  s weighted  value  cal- 
culation method.  Attain,  tl.e  lowest  Q value  w rd  is  tl.e  recognized  w.rd.  The 


\ 


i i 


36 


Speaker  fil 


S;  ciU;or  a 


oiuai.fr  ni 


y,  u:u.<  r H * 


A . f rr.’v? 


Table  ) 


Library  Sequence 
L i'  . y_i\  jvci  V»  cabulary 


word 

1 

kii.us 

V.V  RD 

ii 

6 io 

i r 

io  ) ib  i i. 

13  U 

10  ) 1. 

WORD  - 

i.’ 

el® 

WORD 

l.  c j:ia 

13  11 

io 

» 3 7 *">  1- 

b 13 

) 11  6 13  9 li 

WORD  = 

3 

COMPARE 

W.  RD 

13  takf. 

b 5 13  ) 

10  6 r>  b 13  & 

) 1 

b 13 

3 7 b lb  1 

Wv  RD 

b 

E.V  :AL 

v:  RD 

lb  KIKE 

13  V 

b 13 

11  10  > lb  1. 

C 13 

R i S 7 1 

W RD  • 

r> 

.TK.R0 

W.  RD 

13  AOS  IKK 

It  * 

-»  i l i -i  i v i ale  1 

13  V 

( 13 

10  <•!  1 < I. 

v; . RD 

lr  eTTIeK'S 

ll  \.  i’-J 

t 

1 

b 13 

( 3 

) n 7 6 lb  1 

1. 

b 13  1 

b lb  ^ 13  3 1 U C 13  7 lb  1 1 

v;  rd 

7 

DIVIDE 

WORD 

1'  SIX 

13  10 

, i 

> •* 

6 lo 

lb  1 

1 3 '3  b lb  1 1. 

Wv/iO 

sever 

WORD 

Xw  ijLwWJ  • l 

lb  1 

13  c 

i i 5 13  10 

11  e 1 

lb  1 13  10  \)  3 7 Id  S b lb  . 1. 

VuVrtD 

RAISE 

Vv  HD 

i > EiGirr 

ib  i 

lj  10  b lb  1 \i 

13  7 

b lb  10 

W\  RD 

- 10 

LOGICAL 

Wl  iu) 

. o KETI'RN 

13  11 

10 

) 3 i 10  5 13 

7 •’* 

13  n 

8 7 b 13  8 11  < 1. 

13  10 

ll* 

bO 


Table  9 (Continued) 


WORD 


WORD 


pivid: 


Ii.TT'T 


Wi  RD 


WORD 


W CRD  2 ? F ORTK/O  I 

1U  2 13  10  > 10  8 It  13 


K,  RD 


C.  "VAKE 


EQUAL 


WORD  40  ADS 
13  11  7 1>*  1 13  9 8 


WORD 


wlrd  - ui  oraicia 

5 13  ,)  It  lit  13  8 9 10  11  6 lU  1 1, 


WO\D  - Ui  GliVSH 

lit  l 13  U 10  9 8 10  8 7 10  a 7 l1-  1. 


EIU1IT 


RETVRi; 


Vt  KD 


oivipe 


Wv  KD 


1\  RTRA1 


Table  9 (Continued) 


V.YRD 


WORD 


WCRD 


WORD 


ASi 


V>\  RD 


RRTVRU 


■ruble  10 


Match  Scores  l'or  Four  Test 
Examples  for  Each  Library  Word 

MINUS  CEE 


6 13 

9 

6 

7 10 

7 

14 

1 

1.' 

13  11 

j.0 

9 0 11 

6 

13 

7 12 

WORD 

, 

Os 

Minus 

Q 

- 0.3714 

WORD 

2 

CI.’E 

Q 

■S 

0.8333 

W.  KD 

- 

56 

TIKES 

Q 

1.3333 

We  RD  - 

52 

ORE 

Q 

= 

0.3333 

We  RD 

- 

31 

TIRES 

Q 

1.6  )00 

W.RD  = 

Q*7 

CI.’E 

Q 

1.0  0 

We  RD 

. 6 

Mil  JUS 

Q 

- O.h.  CX) 

WeRD 

1. 

CJMA 

Q 

0.8750 

W e KD 

- 

1 

• rr?n  »r* 

t'UilO  ej 

Q 

= 0.4444 

W.RD  -- 

CEE 

Q 

l.oooo 

WeRD 

- 

56 

TIMES 

1.0000 

WORD 

27 

c:.e 

Q 

1.1667 

13  8 

7 

10  3 7 

10 

11 

B 

11*  1 is 

13  13 

10 

9 7 6 

12 

Wv  RD 

26 

Kurus 

Q 

0.7143 

We  RD 

11 

FOR 

Q 

■= 

0.5000 

W ORD 

- 

31 

TIKES 

Q 

l.li  u7 

WORD 

c 

C .’.E 

Q 

= 

0.80  « 

word 

= 

43 

Q 

• 1.3333 

WC  RD 

c t 

e ..E 

Q 

l.OC  ) 

W.KD 

s 

i6 

hi;  a ij 

£ 

0.7143 

W.RD  • 

"l  1 

F :'R 

/■> 

Si 

0.*  "0 

w;hd 

= 

31 

TIMES 

0 

1.3667 

VJ . RD 

e EE 

Q 

0.6000 

w.rd 

= 

5t> 

TIKES 

Q 

- 1.3333 

We  HD 

l? 

eEE 

M 

H 

0.8000 

6 13 

9 

8 

7 10 

5 

14 

1 

1; 

13  11 

10 

0 3 6 

13 

7 

4 

12 

We  RD 

C 

MIDI’S 

Q 

1.1667 

W HD 

2 

eEE 

Q 

= 

1 . K X) 

W RD 

= 

56 

TIMES 

Q 

• 1.5000 

WeRD 

10 

LO’.ICAL 

Q 

- 

1.1.  50 

WORD 

* 

31 

i I*  uie) 

Q 

- 1.6000 

We  RD 

. 1 

e EE 

Q 

- 

1.0667 

WORD 

1 

Kn.vs 

Q 

= 0.4444 

W\  RD  =- 

10 

IjvVi  ical 

Q 

1.1. 50 

We  RD 

- 

* . 

KIND'S 

Q 

0.6333 

We'RD  - 

2 

e ES 

Q 

= 

l.lt-t'7 

Wi  RD 

56 

TIMES 

Q 

1.0000 

We  RD  = 

27 

eEE 

Q 

1.3333 

6 13 

9 

8 

7 10 

5 

14 

1 

le 

13  11 

10 

8 9 

6 

12 

WORD 

. Z' 

eU 

MIIJUS 

Q 

1.1667 

WORD 

27 

CI.’E 

Q 

O.lU;  0 

WORD 

s 

56 

II.  Ee 

Q 

1.5  >00 

We  RD 

2 

CEE 

Q 

0.3333 

WORD 

“ 

31  TIKES 

Q 

= l.BOOO 

WORD  = 

50 

CEE 

Q 

0.8. XX) 

WORD 

1 

kwvs 

6. 

0.4444 

WORD  * 

27 

CI.’E 

Q 

S 

0.1000 

WeKD 

s 

:v 

KIRI'S 

Q 

= 0.8333 

WeRD  = 

2 

CEE 

Q 

a 

0.2222 

Wei  CD 

- 

56 

TIMES 

Q 

= 1.0000 

WORD  -• 

52 

eEE 

Q 

= 

0.5000 

TuUlc  10  (Continued) 


C-  ''.PARK 


<*  13 

9 10  6 i*  13  13 

7 9 

10  1,? 

5 13 

7 It  13  10 

lu 

WORD 

o8  COMPARE 

Q 

0.6000 

WORD 

00  F.Q’>AL 

Q 

0.60V  V) 

WORD 

3 COMPARE 

Q - 

0.6667 

Wv  RD 

•><»  EQUAL 

Q 

■ 0.("V'0 

WORD 

33  COMPARE 

Q 

0. 71 1*3 

Wv'RD 

It  EQUAL 

Q 

l.OOvX) 

WORD 

o8  COMPARE 

0 

0.;  30,3 

Wv'RD 

09  EQUAL 

Q 

o.ltlx') 

Wv  HD 

3 CuMPARE 

0 

0.3333 

Wv  RD 

It  5 RETURN 

Q 

1 .VX.OO 

Wv  HD 

* 33  CCMPARE 

9 

0.5000 

Wv  RD 

9<t  EQUAL 

Q 

1 .00 0 ’ 

It  13 

9 lo  6 l*  13  7 1 

> 10 

8 lo 

13  7 

1*  lj  10  >) 

lo 

WORk 

IP  COMMA 

Q » 

0.71  It  3 

Wv  RD 

09  EQUAL 

Q 

0.1*0" 

Wv  RD 

33  COMPARE 

Q 

1.1067 

Wv  KD 

1*  EQUAL 

Q 

0.3  ' \l 

word 

. 3 COMPARE 

Q “ 

l.MJOO 

W(  KD 

1*7  IUiVT 

Q 

1 . ll  ( • / 

Wv'RD 

- p8  CV.T-i'Vvl'3 

Q 

0.6  S' It 

Word 

0*>  EQUAL 

Q 

0.1*00  > 

WORD 

‘33  COMPARE 

Q 

0.7778 

Wv'RD 

It  EQUAL 

Q 

0.03  Vi 

word 
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10 
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13  7 
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10 
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Q 
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Q 
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Wv  RD 
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Q 
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Wv  KD 
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Q 

O.'v'OU 
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.'O  COMPARE 

0 

1 • 3730 

UVitil 

1*7  IK’VT 
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Wv'RD 

I'M  COMPARE 

0 

0.7073 

Wv  KD 

."1  EQUAL 

Q 

0.1*.  KX> 

WORD 

33  COMPARE 

Q 

O.i'38  > 

1 Wv'RD 

•t  equal 

Q 

O.R333 

Wv'RD 

3 COMPARE 

Q - 

O.’WX) 

Wv’RD 

It  / IR1VT 

Q 

0.8333 

l»  13 

>)  11  8 t It  13  B 

7 10 
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1 3 7 

It  13  lvD  1 

Wv'RD 

3.3  compare 

0 

l.llt  > 

W.  RD 

. I Ev;"A! 

Q 

0..  '00 

Wv'RD 

oO  CvMI'AlvE 

Q 

1 

Wv  RD 

It  EQUAL 

Q 

o,!.'1  ' 

Wv  RD 

31  MINUS 

9 

1.1.050 

Wv  KD 

it 1 v return 

Q 

l.vwo 

WORD 
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9 

0.83  33 

WORD 

09  EQUAL 

Q 
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Wl  RD 

33  COMPARE 

9 

l.vXXW 

Wv  KD 
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Q 
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W>  RD 
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Wv  RD 
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Q 
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Table  10  (C  ntimied) 


ZERO  T1MK3 


13  7 

10  B 9 8 

9 

10  7 

12 

It  13 

9 

8 

11  6 

.'U 

1 12 

WORD 

55  ZERO 

Cl 

- 0.8333 

WORD 

IS 

31 

TINES 

Q 

0.6667 

VR  RD 

2 

30  ZERO 

Q 

« 1.0000 

Wv  RD 

56 

TIMES 

0 

0.6667 

KD 

1U  HIKE 

k 

1 . . '00 

Wv  KD 

G 

TOES 

Q 

0.8333 

WORD 

55  ZERO 

Q 

- O.B333 

WORD 

56 

TINES 

Q - 

0.4444 

WORD 

14  HI HE 

0 

1.  U ) 

W\  RD 

- 

31 

TIMES 

Q - 

0.5000 

WORD 

30  ZERO 

0 

- 1.0000 

WORD 

6 

TIMES 

Q 

0.5556' 

13  7 

9 

8 9 10 

12 

't  13 

6 

11  lit  1 

12 

Wi  RD 

30  ZERO 

Q 

* 0.0000 

WORD 

31 

TINES 

Q - 

0.1667 

WORD 

55  ZERO 

0 

- O.3o33 

K\  KD 

56 

TOES 

Q 

1.0000 

word 

* 

07  CHE 

Q 

1.0000 

WORD 

- 

6 

TOES 

Q 

1.;  iNX) 

WORD 

30  ZERO 

Q 

- 0.0000 

W.KD 

31 

TOES 

Q 

O.ltoV 

WORD 

a 

55  ZERO 

Q 

- 0.3333 

V.V  KD 

56 

TOES 

Q 

0.8000 

WORD 

: 7 0 HE 

Q 

0.8000 

W.  RD 

6 

TOES 

k 

1.0000 

13  7 

9 

8 9 10 

lo 

It  13 

9 

8 

11  lit 

1 

1? 

WORD 

30  ZERO 

Q 

0 • 0000 

WORD 

31 

TOES 

Q - 

0.3333 

Wi  kd 

55  ZERO 

Q 

0.3333 

W.  RD 

56 

TOES 

Q 

1 .;  ixx> 

WORD 

£-7  'HE 

Q 

- 1 . 0000 

WORD 

- 

6 

TOES 

Q 1 

1.4000 

W,  RD 

= 

30  ZERO 

•1 

0.0000 

WORD 

31 

TOES 

Q 

0.3333 

k\  RD 

55  ZERO 

Q 

- 0.3333 

WORD 

56 

TOES 

Q 

1,0000 

Wv  RD 

C 

:>1  i*JE 

Q 

: 0.80''0 

WORD 

6 

TOES 

Q 

1..000 

13  7 

9 

8 10  9 

10 

10 

It  13 

9 

8 

11  6 

l>t 

1 12 

WORD 

30  ZERO 

Q 

- 0.1667 

W,  RD 

31 

TEES 

Q 

>.6667 

WORD 

55  ZERO 

Q 

= 0.5000 

WORD 

a 

56 

TIMES 

Q - 

0.6067 

WV  RD 

* 

21  ONE 

Q 

- 1..000 

word 

St 

C 

TIMES 

Q * 

0.8333 

WORD 

at 

31  ZERO 

Q 

• 0.16.67 

WORD 

56> 

TIIES 

0 ■> 

0.4444 

WORD 

55  ZERO 

0 

= 0.5000 

W RD 

31 

TIMES 

0 > 

0,5000 

WORD 

■ 

21  ONE 

Q 

l.'OOOO 

W,  RD 

C 

TINES 

Q 

0,5556 
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Table  10  (Continued) 

divide 

SEVEN 

5 13  8 

10  11  7 6 13  10  9 8 

5 

13  7 1*  12  1 

11*  1 

13  8 5 13 

10 

11 

8 7 1*  L 

WORD 

z. 

21*  LOCATE 

Q 

a 

1. 4441* 

W.RD 

= 8 SEVEN 

Q 

1.0000 

WORD 

= 

32  DIVIDE 

Q 

= 

l.t  . 50 

WCRD 

- 58  SEVEN 

Q 

1.1250 

WORD 

= 

28  COMPARE 

Q 

- 

2.0000 

WORD 

33  SEVEN 

Q 

1.500 

WORD 

32  DIVIDE 

Q 

s 

1.5000 

WORD 

8 SCTE1 

Q 

> 0.8750 

WORD 

" 

5 ZERO 

Q 

= 

1.5556 

Wv  KD 

= 58  SEVEN 

Q 

= 0.6/6  ' 

WORD 

* 

51  MINUS 

Q 

* 

2.0000 

Wi.  KD 

= 21  STOP 

Q 

= 1.8333 

13  0 

10 

11  6 13  8 10  987 

5 6 

3£ 

11*  1 

13  8 10  11  5 13 

10 

11  13  10  6 12 

WORD 

a 

32  DIVIDE 

<} 

* 

1.5711* 

WORD 

fa  SEVEN 

Q 

1.2500 

WORD 

* 

58  SEVER 

Q 

•S 

2.0000 

WORD 

33  SEVEN 

Q 

1-3333 

WORD 

- 

1*0  ASSIGN 

3 

- 

2.11*29 

WCRD 

= 58  SEVEN 

Q 

-■  1.3750 

WORD 

£ 

32  DIVIDE 

Q 

a 

0.7692 

WORD 

- 8 SEVEN 

Q 

» 0.8750 

WORD 

- 

2 ORE 

Q 

- 

0.8000 

WORD 

- 58  SEVEN 

Q 

- 0.6750 

WORD 

= 

58  SEVER 

<. 

1.0/69 

word 

33  SEVEN 

Q 

1.6250 

13  8 

I 9 3 5 1*  1? 

il*  1 

13  8 9 5 13 

10 

11  12 

WORD 

s. 

57  DIVIDE 

Q 

s 

1.0000 

WORD 

8 SEVEN 

q 

0.6: 60 

WORD 

= 

21  STOP 

Q 

* 

2.0000 

WORD 

- 58  SEVEN 

Q 

- 0.7500 

WORD 

“ 

1*6  STOP 

Q 

2.0000 

WoiiD 

33  SEVEN 

0. 

1.1.  50 

WCRD 

* 

57  DIVIDE 

Q 

2 . 0000 

WORD 

8 seven 

Q 

0.5  ' ' ) 

WORD 

= 

11*  NINE 

Q 

1 . 50 DO 

WORD 

58  SEVEN 

H. 

0,5  * !© 

WORD 

61*  NINE 

Q 

* 

1.7500 

W(  KD 

71  STOP 

Q 

1,0000 

13  8 

.10 

9’7  5 13  11  10  9 8 

9 

7 

1*  12 

14  1 

13  8 10  5 

13 

10 

11  1. 

WORD 

22  inivt 

Q 

s 

0.9000 

Wi.RD 

8 SKVIN 

c 

• O.c.  6 ' 

WORD 

1*7  INPUT 

Q 

1.2500 

K :RD 

« 68  SEVEN 

Q 

0.7500 

WORD 

7?  I WRIT 

Q 

= 

1.2500 

Word 

33  SEVEN 

Q 

1.1255 

WORD 

s 

27  ONE 

Q 

r 

1.5711* 

word 

* 8 SEVEN 

Q 

0.5x0 

WORD 

& 

55  ZERO 

Cl 

= 

1.5711* 

W RD 

- 68  SEVEN 

r\ 

* 

0.5020 

WCRD 

£ 

5 ZERO 

Q 

=. 

1.8333 
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= 1*3  EIVE 

2.(607 
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Table  JO  (Continued) 


34  FALSE 
5b  FALSE 
9 FALSE 


3r'  LOGICAL 
1'  LOGICAL 

60  Logical 


word 

WORD 

WORD 


Wv.  RD 
WORD 
WORD 


3>t  FALSE 
50  RAISE 
1 FALSE 


35  LOGICAL 
10  LOGICAL 
45  RETURN 


WORD 

WORD 

WORD 


Wi.  RD 
Wi  RD 
WvRD 


\k  RD 
WORD 


?4  LOCATE 
05  FORTRAN 
W?  INFUT 


34  FALSE 
50  FALSE 
9 FALSE 


WORD 

WORD 

WORD 


3<t  FALSE 
59  FALSE 
9 FALSE 


35  LOGICAL 
10  LOGICAL 
60  LOGICAL 


WORD 

WORD 

WORD 


34  FALSE 
59  FALSE 
9 FALSE 


35  LOGICAL 
10  LOGICAL 
45  RETURN 


WuKD 

WORD 

word 


Q - 0.0000 

Q = 0.1667 
Q 0.3333 


10  LOGICAL 
35  LOGICAL 
60  LOGICAL 


WORD 

WORD 

WORD 


Wi'RD  - 59  FALSE 

WORD  = 34  FALSE 

WORD  9 FAISE 


35  LOGICAL 
10  LOGICAL 
?8  COMPARE 


WORD 

word 

word 


WORD 

WORD 

WORD 


59  FALSE 
34  FAISE 
9 FALSE 
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Table  10  (Continued) 


COMMA 


36  FOJR 
11  FU'R 
2 CUE 


WORD 

WORD 

WORD 


37  COMMA 
57  DIVIDE 
6?  comma 


WORD 

WORD 

WORD 


36  FaiR 
g ONE 
ii  FaiR 


WORD 

WORD 

WORD 


6l  FaiR 
36  FaiR 
73  FIVE 


12  COMMA 
53  COMPARE 
pfl  COMPARE 


WORP 

WORD 

WORD 


61  FCXIR 
36  FaiR 
73  five 


WORD 

WORD 

WORP 


WORD 

word 

WORD 


12  comma 
32  DIVIDE 
37  COMMA 


6l  FOUR 
36  FaiR 
11  FOUR 


WORD 

WORD 

WORD 


12  COMMA 
53  COMPARE 
3 COMPARE 


WORD 

WORD 

WCKD 


ol  FOUR 
36  FUIR 
11  FOUR 


12  CCMMA 
62  car  A 
32  DIVIDE 


WORD 

WORD 

WORD 


WORD 

WOilD 

WORD 


12  COMMA 
53  COMPARE 
23  COMPARE 


WORD 

WORD 

WORD 


WORD 

WORD 

WORD 


12  COMMA 
5 ZERO 
37  COMMA 
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Table  10  Continued) 


TAKE 


14  KII  TK 
39  KIKE 
o4  KIKE 


13  TAKE 
3 b TAKE 
63  TAKE 


WORD 

WORD 

WORD 


C4  K11IE 
l4  KIRK 
3 9 KIKE 


WORD 

Wv-’IVU 

WORD 


WORD 

WORD 

WORD 


WORD 

WORD 

WORD 


Q - 0.0000 
Q “ 0.3333 

Q 0,  It  000 


WORD 

WORD 

WORD 


1U  KIKE 
64  KIT  E 
31  HIKE 


WORD 

WORD 

WORD 


WORD  3 1 ZERO 
WORD  - 55  ZERO 

WORD  .6  KIM!: 


13  TAKE 
36  TAKE 
63  TAKE 
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O.OODJ 
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WORD 

WORD 

WORD 


13  take 

38  TAKE 
63  TAKE 


WORD 

WORD 

WORD 
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Table  10  (Continued) 


ASSIGN 


ornoKS 


13  11 

7 14  1 13 

8987  12 

4 13 

8 9 4 14  2 13 

8 

11  14  1 1? 

WORD  = 

4o  ASSIGN 

Q = 0.8750 

wc  rd 

= 66  OPTIONS 

Q 

0.3333 

WORD  = 

15  ASSIGN 

Q = 0.1429 

WORD 

= 4l  OPTIONS 

Q. 

= 0. 900.0 

WORD  = 

72  INPUT 

Q = 2.3333 

WCRD 

- 13  TAKE 

Q 

1.6667 

WCRD 

= 

40  ASSIGN 

Q = 0.750C 

' WORD 

= 

66  omc::s 

0. 

= 0.2500 

WORD 

= 

14  ASSIGN 

Q = 1.0000 

1 WORD 

= 

4l  OPTIONS 

Q 

= 0.7000 

WORD 

* 

7?  INPUT 

Q = 2.3333 

WCRD 

S 

13  TAKE 

0 

= 2.3333 
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14  1 13  9 

8 

7 12 

4 13 

8 10  4 14  2 13  8 11  6 14  1 12 

WORD 

a 

15  ASSIGN 

Q = 0.5000 

WORD 

= 

66  OPTIONS 

Q 

= 0.3333 

W CRD 

- 

hO  assign 

Q.  = 0.6250 

WORD 

* 

4l  OPTIONS 

V 

- o.oooo 

WORD 

s 
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Q = 2. 0000 

WORD 

“ 

16  OPTIONS 

M. 

1.6667 

WORD 

= 

15  ASSIGN 

Q • 0.3750 

WORD 

- 

66  OPTIONS 

Q 

- 0.1667 

WORD 

= 

40  ASSIGN 

Q - 0.5000 

WCRD 

= 

4l  OPTIONS 

Q 

= 0.6154 

WCRD 

~ 
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WCRD 

~ 

20  RETURN 

Q 
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13  7 

14  1 13  9 

8 

7 10  6 12 

4 13 

9 h 
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VoRD 

s 

40  ASSIGN 

Q 0.2000 

WORD 

a 

66  C FTIONS 

Q 

= 0.1538 

WORD 

- 

15  ASSIGN 

Q = 0.3333 

W.RD 

a 

4l  OPTIONS 

Q 

= 0.416', 

WORD 

- 

45  RETURN 

Q = 1.6667 

W^RD 

“ 

16  OPTIONS 

Q 

= 1.0000 

WCRD 

X 

40  ASSIGN 

Q = O.1538 

word 

a 

66  oma;s 

Q 

= 0.1250 

WORD 

= 

15  ASSIGN 

Q = 0.2500 

word 

a 

4l  OPTIONS 

Q 

- 0.2667 

WORD 

~ 

45  RETURN 

Q = 1-3333 

WORD 

45  RETURN 

Q 

- 1.7000 

13  10 

14  1 13  9 

8 

10  11  8 7 10  12 

4 13  8 13  4 14  2 13  8 10 

11  6 14  l 12 

WCRD 

s 

40  ASSIGN 

Q = 1.1250 

WORD 

— 

10  LOGICAL 

Q 

•■=  3.6667 

WORD 

a 

15  ASSIGN 

Q = 1.4286 

WORD 

s 

16  OPTIONS 

Q 

= 4.1667 

WORD 

= 

50  FORTRAN 

Q = 2.5000 

WORD 

= 

35  LOGICAL 

Q 

= 4.1667 

WORD 

5 

40  ASSIGN 

Q = 1.0000 

WORD 

s 

66  OPTIONS 

e 

= 0.1250 

WORD 

s 

15  ASSIGN 

Q = I.2857 

WORD 

* 

4l  OPTIONS 

Q 

= 0.4c86 

WORD 

a 

50  FORTRAN 

Q = 3-1667 

WORD 

= 

45  RETURN 

Q 

* 1.3636 

ac.t's 
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Table  10  (C  ntinued) 


SLASH 


1 

1 

14  1 

13  9 8 

4 

14 

1 

12 

14  l 13 

11  19  9 s 

7 10  8 7 4 14  2 12 

i 

WORD 

17  SIX 

Q = 0.14.9 

WORD  = 

18  SLASH 

q = 1.6667 

1 

WORD 

= 

67  SIX 

q = 0.14.9 

WORD 

43  SIA3H 

q = 0.3077 

i 

WCRD 

- 

42  SIX 

Q - 0.7000 

WORD  - 

68  SLASH 

q = o.4ooo 

1 

WORD 

— 

17  six 

q - o.i4-9 

word 

18  SLASH 

q 0.1667 

j 

WORD 

- 

67  SIX 

Q - 0.1429 

WORD  = 

43  SLASH 

q = 0.3077 

j 

1 

WORD 

42  SIX 

Q = 0.7000 

VLRD 

68  SLASH 

q - o.4ooo 

14  1 

13  8 ? 

4 

14 

1 

12 

14  1 13  11  9 

83  10  11  14  2 1, 

WvRD 

17  SIX 

q o.i4.  > 

word  - 

48  FIVE 

A A O AT-  A 

H.  1 ✓ “* 

WORD 

= 

42  SIX 

q = 0.1429 

WvRD 

68  SLASH 

q 1.0000 

WORD 

- 

67  SIX 

q = o.l4  - > 

WORD  - 

73  FIVE 

q = 1.0000 

WORD 

. 

17  SIX 

q o.i4.  i 

WORD  - 

48  FIVE 

q = 0.8750 

WORD 

42  SIX 

q = o.i4  9 

WORD  ■ 

(6  SLASH 

q 1.0009 

WORD 

“ 

67  SIX 

Q = 0.1429 

WORD 

73  FIVE 

q » l.oooo 

■J 

14  1 

13  9 5 

4 

14 

1 

12 

14  1 13 

10  9 8 10 

n 7 4 i4  ? 1? 

WORD 

- 

17  SIX- 

Q - 0.1429 

WORD  - 

68  SLASH 

Q * 0.2099 

i 

WORD 

S 

67  SIX 

q = 0.1429 

WORD  - 

18  SLASH 

q = 0.4000 

WORD 

42  SIX 

Q = 0.5909 

WC  RD  - 

23  FIVE 

q = 0.8909 

] 

1 

WORD 

- 

17  SIX 

q = 0.1429 

WORD 

68  SLASH 

Q = 0.2000 

4 

WORD 

a 

67  SIa 

Q = 0.1429 

WC  RD  ■ 

18  SLASH 

9 = 0.4-900 

i 

WORD 

= 

42  SIX 

q = 0.5900 

WORD  = 

23  FIVE 

q » 0.8000 

4 

4 14 

1 

13  3 

4 

14 

1 

12 

14  1 13 

11  10  9 8 

7 10  7 4 14  2 12 

H 

WORD 

- 

17  SIX 

Q = 0.2857 

WORD  = 

18  SLASH 

4 = 0.2727 

WORD 

a 

67  SIX 

Q = 0.2857 

WORD  = 

68  SLASH 

Q = 0.3000 

WORD 

* 

42  SIX 

Q = 0.666? 

WORD  = 

43  SIASH 

q = 0.0364 

i 

WORD 

a 

17  SIX 

Q = 0.2657 

WORD  -- 

18  SLASH 

q = 0.2727 

\ 

WORD 

a 

67  SIX 

q = 0.2857 

WORD  = 

68  SLASH 

Q = C.3000 

' 

i 

WORD 

a 

42  SIX 

Q = 0.6667 

WORD  * 

43  SLASH 

Q = 0.6364 

■ 

52 
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Table  10  (Continued) 


1*5  RETURN 
20  RETURN 
29  vquAL 


WORD 

WORD 

WORD 


WORD 

word 

WORD 


19  EIGiiT 


44  EIGHT 
69  EIGHT 
19  EIGHT 


45  RETURN 
20  RETURN 
70  RETURN 


WORD 

WORD 

WORD 


WORD 

WORD 

WORD 


WORD 

WORD 

WORD 


19  Eicon 
17  SIX 
26  MIDI'S 


Q 0.7S.10 
Q = 1.0000 
Q = 1.0000 


V.'ORD 

WORD 

WORD 


WORD 

WORD 

WiKD 


19  EIGHT 
26  MINUS 
31  TIDES 


'RD  = 20  RETURN 

1RD  = 70  RETURN 
'RD  - 45  RETURN 


WORD 

WORD 

WORD 


19  EIGiiT 
40  SIX 
9 FALSE 


20  RETURN 
45  RETURN 
£9  EQUAL 


WORD 

WORD 

WORD 


19  EIGHT 
42  SIX 
1 MINUS 


WORD 

WORD 

WORD 


20  RETURN 
45  RETURN 
70  EQUAL 


69  EIGHT 
19  EIGHT 
44  EIGHT 


45  RETURN 
29  EQUAL 
20  RETURN 


WORD 

WORD 

WORD 


WORD 

WORD 

WORD 


69  EIGHT 
19  EIGHT 
44  EIGiiT 


45  RETURN 
29  EQUAL 
20  RETl'RN 


WORD 

WORD 

WORD 


WORD 

WORD 

WORD 


inuci 


WORD 

WORD 

WCRD 


WORD 

WCRD 

WCRD 


71  STOP 
21  STOP 
46  STOP 


WCRD 

WORD 

WCRD 


WORD 

WcRD 


46  FIVE 
73  FIVE 
23  FIVE 


WORD 

WCRD 

WORD 


WCRD 

WCRD 

WcRD 


WORD  * 48  FIVE 
WCRD  = 73  FIVE 
WCRD  = 23  FIVE 


71  STOP 
21  STOP 
46  STOP 


WORD 

WORD 

WCRD 


WORD 

WCRD 

WORD 


Table  iQ  (Ccntinued) 


LOCATE 


FORTRAI 


74  LOCATE 
49  LOCATE 
52  II, TUT 


q =.  0.4444 
0 1.0000 
Q = 1.0571 


WCRD 

WORD 

WuRD 


75  F.'RTRAII 
25  FGRTR iAIl 
50  FORTRAI! 


74  LOCATE 
1(9  LX  ATE 
22  II! HIT 


WCRD 

WCRD 

WCRD 


75  fgrtra; 

FJRTTJU, 
50  FCRTRAL 


49  LOCATE 
74  LOCATE 
22  II. TUT 


WORD 

WORD 

WORD 


50  FORTRAI 
74  FCRTRAI 
45  RETURI.’ 


WORD 

WCRD 

WORD 


74  LOCATE 
49  L'XATE 
72  Ii.’RJT 


WCRD 

WORD 

WORD 


50  FORTRAI; 
75  FC'TRAI 
4?  RE-URI! 


WCRD 

WORD 

WORD 


25  FGRTRAIi 
50  FCRTRAJ! 

75  fcrtrai; 


74  LOCATE 
49  LXATE 
72  IIJHJT 


WORD  - 45  RETURI! 

WORD  - 55  FORTRAI! 

WORD  = 75  FCRTRAJ! 


WCRD 

WORD 

WORD 


49  LXATE 
75  IIIFUT 
74  LOCATE 


WORD 

WORD 

WORD 


WURD 

WORD 

WCRD 


75  FCRTRAI) 
25  Fi  R1RAI! 
50  FORTRAI! 


WORD 

WORD 

WORD 


72  IRHJT 
?4  LXATE 
55  II1RJT 


75  FCRTRAJ! 
55  FCRTKAI! 
50  FORTRAI! 


WCRD 

WCRD 

WORD 


WORD 

WORD 

WORD 


Q - 0.0769 
Q = 0.0769 

0 - 0.1429 


47  IimiT 
72  IIIRJT 
22  ihtut 


Table  10  (Continued) 


13  7 9 6 6 4 5 

WORD  ' 72  ITirVJT 

WORD  - 22  IDil'T 

WORD  47  IliRl  i 

WORD  - 72  IRPJT 
WORD  22  IUiVX 
WORD  = 47  Ii.'POT 


13  11  9 10  7 4 12 

Q = 0.5556 
Q = 0.6000 

ft  - 0.6750 

Q - 0.4167 
ft  - 0.4615 
Q = 0.6364 


WORD 

WORD 

WoRD 


7.'  IKIVT 

4r  i::ivt 


0.3333 
--  0. 6.  50 
0.6667 


WORD 

WORD 

WORD 


- 0.1667 
= O.3636 
0.4167 


WORI 

WO°D 

WORD 


47  IIHVT 
= 7-  ItiPUT 
= 12  IIIRT 


■ 0.0000 
- 0. . .. 
- 0.3000 


W’.RD 

VMRD 

WORD 


4? 

IITV? 

■ it  h:rt 


--  0 !YTO 
o!l667 
0.2303 


WORD 

WORD 

WORD 


ft 

ft 

ft 


= 0.1009 
0.1000 
- 0.l8l6 


IRR'T 
IIIRJ  X 
HIK'D 


47  IIIFTiT 

72  IIIVVT 
22  IIIHJT 


detected  category  sequence  is  also  shown  for  each  test  word  just  before  the 
Q values. 

The  raw  recognition  rates  for  the  100  test  utterances  were  941  recog- 
nition ana  6%  errors  for  Q option  Hi,  and  921  recognition  and  81  errors  for 
Q option  #2.  This  data  does  not  mak^  use  of  the  threshold  value  provided  in 
the  program.  When  tnis  provision  is  used;  an  advantage  appears  in  the  use  ef 
the  second  method  of  Q calculation.  Figures  11  and  12  shows  the  effect  - f 
varying  the  threshold  value.  The  figures  show  the  percentage  of  recognition 
error  and  non-recognition  rate  for  various  values  of  threshold  applied  t.  the 
match  Q value.  By  comparing  the  two  curves,  it  can  be  seen  that  if  it  is  de- 
sired to  hold  ho  error  rate  to  ?%  for  example,  then  option  Hi  has  a recog- 
nition rate  of  59%  a non-recognition  rate  of  39%  and  an  error  rate  of  2% 

Option  ?/2,  however,  has  a recognition  rate  of  75%  a non -recognition  ra*e  of 

231,  and  an  error  rate  of  ?%  Thus,  obvious  improvements  are  available  by 
using  the  nasal  string  and  error  analysis  weighted  Q calculation  provided 
for  by  option 

The  relative  importance  of  the  nasal  category  errors  can  be  seen  in 

Figures  13  and  1%  Here,  the  25  word  vocaoulary  data  nas  been  revorr.ed  to 

eliminate  vocabulary  words  containing  nasals.  The  results  have  changed  con- 
siderably. The  unthreshelded  recognition  rates  are  now  93,1  for  both  cpti_.n  1 
and  ptien  r Q calculation.  There  still  appears  to  be  an  advantage  with  op- 
tion 2 when  the  Q threshold  is  applied  as  shown  in  Figures  13  and  14;  however, 
this  is  due  only  to  -he  advantages  of  error  analysis  weighting  of  Q eulc..la- 
ticn. 


RECCGKITIOH  RA'l 


RECOGNITION  RATE 


recognition  rate 


mmm 


;nition  Rate  IJasal  Ir.terpola" 


Ai.'D  RECttME.’BATICBS 


The  category-sequence  word  recognition  software  was  pursued  to  a rela- 
tively high  dgreee  of  refinement  during  the  program.  Analysis  of  test  data 
taken  at  the  end  of  the  program  has  lead  to  the  conclusion  that  the  next  major 
step  in  the  recognizer  performance  will  come  with  improvements  in  the  phonemic 
category  recognition  }lardv:are . However,  work  directed  at  improving  the  pno- 
r.emic  category  recognizers  during  the  course  of  the  program  has  led  to  the 
conclusion  that  extensive  changes  must  be  made  in  the  category  detectors  be- 
fore a significant  reduction  in  their  contribution  to  word  errors  can  be  made. 
It  is  felt  that  the  implementation  of  these  changes  will  require  a complete 
redesign  and  rebuilding  of  the  phonemic  category  recognizer.  These  changes 


The  elimination  of  certain  design 
limitat ions . 


The  reduction  of  random 
caused  by  the  t arameter 


The  reduction  cf  random  errors 
caused  by  the  category  extractor 


The  reduction  cf  the  recognizer' 
sensitivity  to  external  acoustic 
noise . 


While  each  of  these  areas  is  important  to  the  overall  performance,  it 
is  felt  that  the  first  area,  design  limitations,  is  the  most  important  to  the 
nasal  category  recognition  problem  inasmuch  as  it  will  permit  the  inclusion  of 
additional  information  upon  which  to  base  the  nasal  recognition  decision. 

Four  design  limitat .on  changes  are  felt  to  be  necessary.  These  are: 


1.  Kew  Category  Extractor 


The  most  st.'ere  deficiency  in  the  present  recognizer  is  the 
inability  to  recognize  nasals  with  satisfactory  reliability.  The 
inclusion  of  two  new  phonemic  categories  should  greatly  help  to 


reduce  this  ni'oblen.  These  categories  are  really  the  semi-st.. 
features  encountered  in  the  articulation  of  nasals.  These  sen 
stops  are  divided  into  semi-onset  and  semi-release  classes. 
While  these  features  do  not  uniquely  identify  the  occurrence  c 
a nasal,  they  always  ^ccur  when  a nasal  occurs.  (Unfortunatel 
semi-stops  are  also  found  during  the  articulation  of  l's  and  r 
in  certain  phonemic  environments.)  This  overlap  is  not  severe 
because  the  occurrence  of  the  feature  is  sufficiently  eonsisto 
to  a| rear  in  both  test  and  library  wurds.  Thus,  the  r use  as 
category  should  provide  back-up  for  the  existing  nasal  categ  r 
It  is  also  suggested  that  the  intervocalic  pause  also  be  carri 
as  a separate  new  category.  In  the  current  logic  it  is  used  a 
a feature.  This  tends  to  rob  the  word  logic  of  a reliable  bit 
of  information  relating  to  the  phonetic  structure  of  the  word. 


2.  Inclusion  of  i’u  :.erw  Du  rat  U nal  In  form. at.  it  n 


The  present  category  extractors  do  not  include  phoneme  dur 
tion  as  part  of  the  information  transmitted  to  the  word  logic. 
It  has  become  evident  from  working  with  the  detected  strings  - : 
phonemic  category  that  the  inclusion  of  vowels  and  fricatives 
duration  would  provide  significant  held  ac  the  word  logic  le"C 
in  determining  the  relative  importance  of  vowel  and  fricative 
detections.  At  present,  for  example,  many  extra  vowels  are  de- 
tected. The  relative  importance  of  these  vowels  cannot  he  dot 
mined  because  of  the  lack  of  duration  information,  (in  genera, 
the  longer  t lie  vowel  the  more  significant.)  It  is  suggested  * 
such  information  can  be  incorporated  by  providing  the  phonemic 
category  logic  with  the  capability  of  repeating  a category  de* 
tion  in  proportion  to  its  duration.  The  word  logic  may  then 
weight  the  relative  importance  of  any  detection  by  the  number 
repetitions.  Thus , the  category  sequence  ml 

become  13-3-9-.’-  >-l>l!*-l-l?  or  13-3-8-8-0-10-lh-l-l?,  depend i: 
upon  the  relative  duration  of  the  8 and  ) category. 


-ii  f i sat  ions 


Fricative  detections  should  to  made  on  an  instantaneous  para- 
meter basis  rattier  than  on  an  average  jaranetor  value  during  t o 
fricative  interval  as  is  done  in  the  current  logic.  The  | resent 
fricative  detectors  produce  one  fricative  decision  per  fricative 
utterance,  und  thus  v.nly  represent  the  average  value  of  the  fries 
tivc.  It  lias  become  clear  that  the  fricatives  should  be  detected 
much  the  same  way  as  the  vowels.  Thus,  if  a change  in  the  frica- 
value  occurs  during  ttie  articulation,  an  indication  of  that  ehj_tg< 
should  be  sent  to  the  word  logic. 


Additional  P: 


An  additional  parameter  has  been  found  that  improves  the  dis- 
crimination of  the  f0h  category.  Tliis  parameter  is  a measure  of 
the  randomness  of  the  period  of  the  clipped  speech  signal.  The 
more  random  the  period,  the  more  likely  that  an  i'9h  category  is 
present.  This  information  should  be  incorporated  as  a supplement 
to  the  present  SET  spectral  information. 


The  elimination  of  randon 
involve  three  changes : 


The  GET  extract  or  output  contains  a certain  component  of  noise 
that  is  difficult  to  filter  out  with  normal  techniques.  This  is 
because  the  signal  is  characterized  by  large,  very  fast  changes 
that  contain  significant  information  as  -sell  as  small  fast  changes 
that  are  noise 

also  eliminates  the  large  fast  changes.  A programmable  filter  ha: 
been  breadboarded  that  heavily  filters  the  signal  in  the  absence 
of  large  changes  and  greatly  increases  its  bandwidth  during  the 
fast  changes.  This  technique  shows  a major  reduction  in  the  sig- 
nal noise  without  l.sing  any  significant  infomatior. . 


W> 


The  present  amplitude  detector  uses  a peak  detector  envelope 
filtering  technique.  It  has  been  found  that  low-pass  filtering 
produces  a more  consistent  signal  response.  It  is  suggested  that 
this  filtering  system,  be  adeptod. 


SK?  V,  wel  and  Fricative  Quantizers 


The  proposed  change  in  the  SET  filter  makes  possible  a better 
SET  quantiser  that  should  considerably  reduce  the  number  of  extra 
vowel  detections.  In  the  past,  because  of  GET  noise,  it  was  im- 
possible to  employ  hysteresis  in  the  threshold  detectors.  By 
using  the  SET  filter,  a good  deal  of  noise  present  in  the  output 
of  the  quantizers  can  be  eliminated  by  the  use  of  hysteresis. 


There  are  two  changes  that  would  help  to  eliminate  errors  in  the  pho 
nemic  category  detectors.  These  are: 


1.  Background  Iioise  Level  Shift  l'or  the  Stop  Category  Detectors 


The  stoi  category  detectors  utilize  the  derivative  solitude 
parameter  information  to  make  decisions.  The  amplitude  of  the 
derivative  is  a function  of  the  background  level;  thus,  stop 
category  decisions  based  upon  derivative  amplitude  are  modified 
by  background  level.  It  is  suggested  that  background  level  can 
be  compensated  for  by  moving  t.he  stop  detector  threshold  in  ac- 
cordance with  a measurement  of  the  background  level. 


2.  Vowel-Fricative  Detector  Phonetic  Environment  Error 


The  vowel-fricative  detector  is  based  upon  the  detection  of 
fixed  minimum  durations  cf  voiced  and  unvoiced  energy.  Unfor- 
tunately, these  minimum  durations  vary  depending  upon  the  pho- 
netic environment.  It,  is  suggested  that  these  various  environ- 
ment conditions  be  used  to  alter  the  minimum  time  duration  re- 
quired to  make  vowel-fricative  category  decisions. 


One  change  is  recommended  to  tije  no'se  sensitivity  of  the  recog- 

nizer. This  involves  a modification  of  the  speed)  silence  detector  so  that 
the  presence  of  a minimum  duration  period  of  voicing  information  is  required 
to  establish  the  presence  cf  a spoken  word.  If  this  requirement  is  met  before 
the  amplitude  parameter  returns  to  silence,  the  word  logic  is  reset. 
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Report  on  speech  recopnirer  experiment  s 


Thin  report.  doRoribca  the  effort  n undertaken  to  improve  the  ex  per iwent at  model 
"Voice  Pound  Uocop.nl y.rr"  originally  lull  It  under  Contract  AT  \C(0.'-t  f-OtOO.  Thin 
««vii  t»mrnt  utiJi.-c.i  t ‘r  t ec-.niquen  of  !'  i t f l »*  !‘«u  l valent  formant  mrninct  c*r  * xlrnction 
phonemic  cut epory  recognition,  and  caterory-somionee  wont  recognition. 


1 (tensive  I ardvnrr  and  noftwnre  *~odi  f i cut  iont.  to  the  basic  recopniyor  system  were 
made  durirp  the  proprun  which  include  the  use  of  r.rmi  nut  onat  ic  speaker  ndhttetion 
by  men  ns  of  distance  functions  *leffnr«l  by  rvtr  of  phonemic  category  strings  n*'H 
nearest  neighbor  word  recognition  decisions. 


The  final  recognizer  confirurnf ion  displayed  a reduced  speaker  sensitivity  and 
an  average  recognition  rate  for  four  speakers  of  0‘»t  when  uni  nr.  a CS-vord  vocnb 
ulary . 


